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Abstract The heterotetramerlc AP and F-COPI complexes help to define the cellular map of 
modern eukaryotes. To search for related machinery, we developed a structure-based 
bioinformatics tool, and identified the core subunits of TSET, a 'missing link' between the APs and 
COPI. Studies in Dictyostelium indicate that TSET is a heterohexamer, with two associated 
scaffolding proteins. TSET is non-essential in Dictyostelium, but may act in plasma membrane 
turnover, and is essentially identical to the recently described TPLATE complex, TPC. However, 
whereas TPC was reported to be plant-specific, we can identify a full or partial complex in every 
eukaryotic supergroup. An evolutionary path can be deduced from the earliest origins of the 
heterotetramer/scaffold coat to its multiple manifestations in modern organisms, including 
the mammalian muniscins, descendants of the TSET medium subunits. Thus, we have uncovered 
the machinery for an ancient and widespread pathway, which provides new insights into early 
eukaryotic evolution. 
DOI: 10.7554/eLife.02866.001 



Introduction 

The evolution of eukaryotes some 2 billion years ago radically changed the biosphere, giving rise to 
nearly all visible life on Earth. Key to this transition was the ability to generate intracellular membrane 
compartments and the trafficking pathways that interconnect them, mediated in part by the heterote- 
tramerlc adaptor complexes, APs 1-5 and COPI {Dacks et al., 2008; Field and Dacks, 2009; Hirst 
et al., 2011; Koumandou et al., 2013). In mammals, APs 1 and 2 and COPI are essential for viability, 
while mutations in the other APs cause severe genetic disorders [Boehm and Bonifacino, 2002; 
Hirst et al., 2013). The AP and COPI complexes share a similar architecture, due to common ancestry 
predating the last eukaryotic common ancestor (LECA). All six complexes consist of two large subunits 
of ~1 00 kD, a medium subunit of ~50 kD, and a small subunit of -20 kD {Figure 1A). Their function is 
to select cargo for packaging into transport vesicles, and together with membrane-deforming scaf- 
folding proteins such as clathrin and the COPI B-subcomplex, they facilitate the trafficking of proteins 
and lipids between membrane compartments in the secretory and endocytic pathways. The recent 
discovery of the evolutionarily ancient AP-5 complex, found on late endosomes and lysosomes, added 
a new dimension to models of the endomembrane system, and raised the possibility that other unde- 
tected membrane-trafficking complexes might exist {Hirst et al., 2011). Therefore, we set out in 
search of additional members of the AP/COPI subunit families. 
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eLife digest Eukaryotes make up almost all of the life on Earth that we can see around us, and 
include organisms as diverse as animals, fungi, plants, slime moulds, and seaweeds. The defining 
feature of eukaryotes is that, unlike nearly all bacteria, they have membrane-bound compartments — 
such as the nucleus — within their cells. 

Moving molecules, such as proteins, between these compartments is essential for living eukaryotic 
cells, and these molecules are usually trafficked inside membrane-bound packages called vesicles. 
Two similar sets of protein complexes — each containing four different subunits — ensure that the 
molecules are packaged inside the correct vesicles. However, it is not clear how these two protein 
complexes (called the AP complexes and the COPI complex) are related to each other, and when 
and where they originated in the history of life. 

Now, Hirst, Schlacht et al. have discovered a new — but very ancient-protein complex that they 
refer to as the 'missing link' between the AP and COPI complexes. The four subunits inside this new 
complex were found by searching for proteins with shapes that were similar to those of the AP and 
COPI proteins, rather than just searching for proteins with similar sequences of amino acids. This 
approach identified related protein subunits in groups as diverse as plants and slime moulds, which 
suggests that this protein complex evolved in the earliest of the eukaryotes. The four subunits 
identified in a slime mould were confirmed to interact, and also shown to bind to the plasma 
membrane of living cells. 

One of the subunits had already been named TPI_ATE, so Hirst, Schlacht et al. decided to call the 
complex TSET; the other three subunits were named TSAUCER, TCUP and TSPOON, and two other 
proteins that interacted with the complex were both called TTRAY. 

While most of the TSET complex itself has been lost from humans and other animals, one of 
subunit appears to have evolved into a family of proteins that help molecules get into cells. The 
discovery of TSET reveals another major player in vesicle-trafficking that is not only important for 
our understanding of how modern eukaryotes work, but also how ancient eukaryotes evolved. 
DOI: 10.7554/eLife.02866.002 



Results and discussion 

The search for novel AP-related complexes 

Because we were unable to find any promising candidates for new AP/COPI-related machinery using 
sequence-based searches, we developed a more sensitive tool, designed to search for structural 
similarity rather than sequence similarity. Using HHpred to analyse every protein in the RefSeq data- 
base from 15 organisms, covering a broad span of eukaryotic diversity, we built a 'reverse HHpred' 
database. This database contains potential homologues for >300,000 different proteins (http:// 
reversehhpred.cimr.cam.ac.uk), and can be searched with structures from the Protein Data Bank 
(PDB). As proof of principle, we used this database to identify all four subunits of the AP-5 complex 
{Figure 1 — figure supplements 1 and 2; Figure 1 — source data 1, 2), even though in our previous 
study only the medium subunit was initially detectable by bioinformatics-based searching (Hirst 
et al., 2011). 

In addition to known proteins, our reverse HHpred database revealed novel candidates for each 
of the four subunit families, with orthologues present in diverse eukaryotes including plants and 
Dictyostelium {Figure 1 — figure supplements 2-4, Figure 1 — source data 1, 2). Secondary structure 
predictions confirmed that the new family members have similar folds to their counterparts in the AP 
complexes and COPI [Figure IB). Only one of these proteins had been characterised functionally: 
TPLATE (NP_1 86827.2), an Arabidopsts protein related to the AP (3 subunits and P-COP, found in a 
microscopy-based screen for proteins involved in mitosis and localised to the cell plate (Van Damme 
et al., 2006; Van Damme et al., 2011). There is some variability between orthologous subunits in 
different organisms: for instance, Arabidopsis has added an SH3 domain to the C-terminal end of its 
'YQSe^' large subunit, while Dictyostelium has lost the jj homology domain (MHD) at the end of its 
medium subunit; and in general there seems to be much less selective pressure on these genes than 
on those encoding other AP/COPI family members (e.g., the AP-1 pi subunits are 58.01% identical in 
Dictyostelium and Arabidopsis, while the new p family members are only 14.63% identical). 
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Figure 1. Diagrams of APs and F-COPI. (A) Structures of the assembled complexes. All six complexes are heterote- 
tramers; the individual subunits are called adaptins in the APs (e.g., y-adaptin) and COPs in COPI (e.g., y-COP). The 
two large subunits in each complex are structurally similar to each other They are arranged with their N-terminal 
domains in the core of the complex, and these domains are usually (but not always) followed by a flexible linker 
and an appendage domain. The medium subunits consist of an N-terminal longin-related domain followed by a 
C-terminal homology domain (MHD). The small subunits consist of a longin-related domain only. (B) Jpred 
secondary structure predictions of some of the known subunits (all from Homo sapiens), together with new 
family members from D/ctyoste//um discoideum (Dd) and Arab/dopsis thaliana (At). See also Figure 1 — figure 
supplements 1-4, Figure 1 — source data 1, 2. 
DOI: 10.7554/eLife.02866.003 

The following source data and figure supplements are available for figure 1 : 

Source data 1. Large subunit homologues found by reverse HHpred in different organisms. 

DOI: 10.7554/eLife.02866.004 

Source data 2. Medium and small subunit homologues found by reverse HHpred in different organisms. 
DOI: 10.7554/eLife.02866.005 

Figure supplement 1. PDB entries used to search for adaptor-related proteins. 
DOI: 10.7554/eLife.02866,006 
Figure 1. Continued on next page 
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Figure 7. Continued 

Figure supplement 2. Summary table of all subunits identified using reverse HHpred. 
DOI: 10.7554/eLife.02866.007 

Figure supplement 3. Subunits that failed to be identified using reverse HHpred, but were identified by homology 
searching using NCBI BLAST. 
DOI: 10.7554/eLife.02866.008 

Figure supplement 4. TSET orthologues in different species. 
DOI: 10.7554/eLife.02866.009 

Figure supplement 5. Identification of ENTH/ANTH domain proteins and the AP complexes w/ith which they 
associate, using reverse HHpred. 
DOI: 10.7554/eLife.02866.010 



TSET: a new trafficking complex 

To determine whether the four new candidate subunits identified in our searches actually form a com- 
plex, we transformed D. discoideum with a GFP-tagged version of its small (a-like) subunit {Figure 2A), 
and then used anti-GFP to immunoprecipitate the construct and any associated proteins from cell 
extracts {Figure 2B). Precipitates were analysed by mass spectrometry, yielding ten proteins consid- 
ered to be specifically immunoprecipitated {Figure 2 — figure supplement la). Two of these were the 
small subunit itself and its GFP tag. Three others were the remaining candidate subunits: XP_639969.1 
(the P-like subunit), XP_640471.1 (the yaSe^-like subunit), and XP_629998.1 (the |j-like subunit), con- 
firming their presence in a complex. Quantification by iBAQ indicated that these three proteins were 
present in the immunoprecipitate at approximately equimolar levels {Figure 2C, Figure 2 — figure 
supplement 1A), while the small subunit and GFP tag were in ~1 5-fold molar excess, probably due to 
overexpression. 

Interestingly, two of the other proteins in the immunoprecipitate, also approximately equimolar to 
the three coprecipitating subunits, were XP_642289.1 and XP_637150.1. Both proteins are predicted 
to consist of two N-terminal (S-propeller domains followed by an a-solenoid {Figure 2D, Figure 2 — 
figure supplement 1Q. This type of architecture is found in several coat components, including clath- 
rin heavy chain, SPG11 (associated with AP-5), the a-COP and P'-COP subunits of the COPI coat 
(B-COPI), and the Sec31 subunit of the COPII coat (Devos et al., 2004). HHpred analyses show that 
the closest matches for both XP_642289.1 and XP_6371 50.1 are P'-COR followed bya-COP Probable 
orthologues of XP_642289.1 and XP_637150.1 can be found in other organisms that have the four 
core subunits {Figure 1 — figure supplement 4). Because proteins with this architecture often act as a 
coat for transport vesicles, we hypothesize that these proteins may provide a scaffold for the newly 
identified heterotetramer. 

The other three proteins in the immunoprecipitate, secG and vacuolins A and B, appear to be less 
widespread taxonomically {Figure 2 — figure supplement 2 and 3), but are nonetheless suggestive of 
function. SecG is related to the plasma membrane- and endosome-associated ARNO/cytohesin family 
of Arf GEFs in animal cells {Shina et al., 2010), and also appears to be equimolar with the core com- 
plex. Vacuolins are members of the SPFH (stomatin-prohibitin-flotillin-HflC/K) superfamily. They have 
been shown to associate with the late vacuole just before exocytosis and also with the plasma mem- 
brane {Rauchenberger et al., 1997; Gotthardt et al., 2002), and to contribute to vacuole function 
{Jenne et al., 1998). However, the amounts of coprecipitating vacuolins were more variable, suggest- 
ing that they are less tightly associated with the complex {Figure 2 — figure supplement 1A). Thus, 
like TPLATE, both SecG and the vacuolins have been implicated in membrane traffic, acting at the 
plasma membrane and/or endosomal compartments. 

As the p-like subunit is already named TPLATE, we propose similar nomenclature for the other three 
subunits of the heterotetramer, relating to their relative sizes: TSAUCER, TCUP, and TSPOON. For the 
two associated P-propeller/a-soIenoid proteins, we propose TTRAY1 and TTRAY2, and for the con- 
served heterohexamer, we propose the name TSET. 

Characterisation of the TSET complex in Dictyostelium 

One of the key properties of coat proteins is their ability to cycle on and off membranes. Although by 
widefield fluorescence microscopy TSPOON-GFP looked diffuse and cytosolic {Figure 2 — figure 
supplement IB), TIRF imaging showed a punctate pattern, especially in the cells with lower expression. 
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Figure 2. Characterisation of the TSET complex in Dictyostelium. (A) Western blots of axenic D. discoideum 
expressing either GFP-tagged small subunit (a-like) or free GFP, under the control of the ActinIS promoter, labelled 
with anti-GFP. The Ax2 parental cell strain was included as a control, and an antibody against the AP-2a subunit was 
used to demonstrate that equivalent amounts of protein were loaded. (B) Coomassie blue-stained gel of GFP-tagged 
small subunit and associated proteins immunoprecipitated with anti-GFP. The GFP-tagged protein is indicated with 
a red asterix. (C) iBAQ ratios (an estimate of molar ratios) for the proteins that consistently coprecipitated with the 
GFP-tagged small subunit. All appear to be equimolar with each other, and the higher ratios for the small (a-like/ 
TSPOON) subunit and GFP are likely to be a consequence of their overexpression, which we also saw in a repeat 
Figure 2. Continued on next page 
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Figure 2. Continued 

experiment in which we used the small subunit's own promoter [Figure 2 — figure supplement 1). (D) Predicted 
structure of the N-terminal portion of D. discoideum TTRAY1, shown as a ribbon diagram. (E) Stills from live cell 
imaging of cells expressing either TSPOON-GFP or free GFP, using TIRF microscopy. The punctate labelling in the 
TSPOON-GFP-expressing cells indicates that some of the construct is associated with the plasma membrane. See 
Videos 1 and 2. (F) Western blots of extracts from cells expressing either TSPOON-GFP or free GFP The post- 
nuclear supernatants (PNS) were centrifuged at high speed to generate supernatant (cytosol) and pellet fractions. 
Equal protein loadings were probed with anti-GFP. Whereas the GFP was exclusively cytosolic, a substantial 
proportion of TSPOON-GFP fractionated into the membrane-containing pellet. (G) Mean generation time (MGT) for 
control (Ax2) and TSPOON knockout cells. The knockout cells grew slightly faster than the control. (H) Differentiation 
of the Ax2 control strain and two TSPOON knockout strains (1725 and 1727). All three strains produced fruiting 
bodies upon starvation. (I) Assay for fluid phase endocytosis. The control and knockout strains took up FITC- 
dextran at similar rates. (J) Assay for endocytosis of membrane, labelled with FM1-43, showing the time taken to 
internalise the entire surface area. The knockout strains took significantly longer than the control (*p<0.05; **p<0.01). 
See also Figure 2 — figure supplements 1 and 2, Figure 2, Videos 1 and 2. 
DOI: 10.7554/eLife.02866.011 

The following figure supplements are available for figure 2: 

Figure supplement 1. Further characterisation of Dictyoste//um TSET. 

DOI: 10.7554/eLife.02866.012 

Figure supplement 2. Distribution of secG. 

DOI: 10.7554/eLife.02866,013 

Figure supplement 3. Distribution of vacuolins. 

DOI: 10.7554/eLife.02866.014 



indicating that some of the construct is associated with the plasma membrane {Figure 2E, Figure 2; 
Video 1). In contrast, free GFP appeared to be entirely cytosolic {Figure 2E, Figure 2; Video 2). 
In addition, high speed centrifugation of a post-nuclear supernatant showed a substantial amount of 
TSPOON-GFP coming down in the membrane-containing pellet, in contrast to free GFP, which was 
exclusively in the supernatant {Figure 2F). These findings indicate that like other coat proteins, the 
complex is transiently recruited onto a membrane (specifically, the plasma membrane) from a cytosolic 
pool. 

Silencing TPI_ATE in Arabidopsis produces a very severe phenotype, with impaired growth and 
differentiation, thought to be caused by defects in clathrin-mediated endocytosis (Van Damme et al., 
2006; Van Damme et al., 2011). To investigate the function of TSET in DictyosteUum, we dis- 
rupted the TSPOON gene by replacing most of 

the coding sequence with a selectable marker 

{Figure 2 — figure supplement 1D). Surprisingly, 
the resulting knockout cells grew at least as fast 
a control axenic strain {Figure 2G shows the 
mean generation time); and differentiation also 
appeared normal, with fruiting bodies forming 
under appropriate stimuli {Figure 2H). Uptake of 
FITC-dextran, an assay for fluid phase endocy- 
tosis, was unimpaired in the TSPOON knockout 
cells {Figure 2/); however, uptake of FM1-43, a 
membrane marker, was slower than in the control 
{Figure 2J shows the time taken to internalise the 
entire surface area), indicating that TSET plays 
a role in plasma membrane turnover, consistent 
with studies on Arabidopsis. Nevertheless, it is 
clear that in contrast to Arabidopsis, DictyosteUum 
can thrive without a functional TSET complex. 

Very recently, the discoverers of TPLATE used 
tandem affinity purification to identify TPLATE 
binding partners, and found the Arabidopsis 
orthologues of the TSET components that we 




Video 1. Related to Figure 2. TIRF microscopy of 
D. discoideum expressing TSPOON-GFP, expressed 
off its own promoter in TSPOON knockout cells. One 
frame was collected every second. Dynamic puncta can 
be seen, indicating that the construct forms patches at 
the plasma membrane. 
DOI: 10.7554/eLife.02866.015 
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Video 2. Related to Figure 2. TIRF microscopy of 
D. discoideum expressing free GFP, driven by the 
ActinIS promoter in TSPOON knockout cells. One 
frame was collected every second. The signal is diffuse 
and cytosolic. 

DOI: 10.7554/eLife.02866.016 



identified independently in the present study 
{Gadeyne et al., 2014). The Arabidopsis pull- 
downs did not contain any proteins resembling 
secG or the vacuolins, supporting our hypo- 
thesis that these proteins are add-ons to the 
core heterohexamer. However, Arabidopsis TSET 
is associated with two additional proteins con- 
taining EH domains, which we did not find in our 
Dictyostelium pulldowns. Some of the Arabidopsis 
pulldowns also brought down components of 
the machinery for clathrin-mediated endocytosis, 
including clathrin itself. Although we also found 
clathrin and associated proteins in our Dictyostelium 
immunoprecipitates, these proteins were equally 
abundant in control immunoprecipitates from non- 
GFP-expressing cells, indicating that they were 
contaminants. The differences in proteins that 
coprecipitate with TSET in the two organisms are 
probably a reflection of functional differences: TSET 
knockouts in Arabidopsis are lethal and knock- 
downs profoundly affect clathrin-mediated endo- 
cytosis, while TSET knockdowns in Dictyostelium 
produce a very mild phenotype. 



TSET is ancient and widespread in eukaryotes 

When TPLATE was discovered in Arabidopsis, it was reported to be unique to plant species 
(Van Damme et al., 2006; Van Damme et al., 2011). Similarly, in the more recent Arabidopsis study, 
the authors concluded that the complex was plant-specific {Gadeyne et al., 2014). However, these 
conclusions were based on analyses of plants, yeast, and humans only. Our identification and charac- 
terization of homologues of all six subunits in Dictyostelium discoideum, as well as their presence in 
the excavate Naegleria gruberi, suggested that the evolutionary distribution was much more exten- 
sive. In depth homology searching identified orthologues in genomes from across the broad diversity 
of eukaryotes {Figure 3, Figure 3 — source data 1, Figure 3 — figure supplement 1), strongly suggesting 
that the complex was present prior to the LECA. 

Although TSET is clearly ancient, its relationship to the other heterotetrameric complexes was 
unclear from homology searching alone. Consequently, after analyses of the individual subunits 
{Figure 4 — figure supplements 1-7), we performed a phylogenetic analysis on the concatenated 
set of the four core subunits for direct comparison of TSET with the other AP and COPI complexes 
{Figure 4A, Figure 4 — figure supplement 8). This provided moderate support for TSET as a clade, 
but strong resolution excluding it from the APs and COPI, as well as backbone resolution between the 
heterotetramer clades. Thus, TSET is clearly an ancient component of the eukaryotic membrane- 
trafficking system, distinct from the known heterotetramers. 

Phylogenetic analysis of the TTRAYs and their closest relatives, P'-COP and a-COP {Figure 4 — 
figure supplement 9), showed that the paralogues are due to ancient duplications in the TSET and 
COPI families respectively, which occurred prior to the divergence of the LECA. Together, these find- 
ings imply that the ancestor of the TSET, COPI, and AP complexes was a heterohexamer rather than a 
heterotetramer, consisting of five different proteins, with the two scaffolding proteins present as two 
identical copies {Figure 4B,Q. These scaffolding subunits then duplicated independently in COPI and 
TSET. The ancestral AP complex may have lost its original scaffolding subunits, although AP-5, the first 
AP to branch away, is closely associated with SPG11, a (S-propeller + a-solenoid protein whose rela- 
tionship to the TTRAYs and B-COPI is as yet unclear. None of the other APs has any closely associated 
proteins with this architecture, but AP-1 and AP-2 transiently interact with clathrin, and there may also 
be a transient association between AP-3 and another P-propeller -i- a-solenoid protein, Vps41 {Rehling 
et al., 1999; Cabrera et al., 2010; Asensio et al., 2013). 

Although TSET is deduced to have been present in LECA, the complex appears to have been 
entirely or partially lost in various lineages {Figure 3B). None of the subunits has a full orthologue in 
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Figure 3. Distribution of TSET subunits. (A) Coulson plot showing the distribution of TSET in a diverse set of representative eukaryotes. Presence of the 
entire complex in at least four supergroups suggests its presence in the last eukaryotic common ancestor (LECA) with frequent secondary loss. Solid 
sectors indicate sequences identified and classified using BLAST and HIVIIVIer Empty sectors indicate taxa in which no significant orthologues were 
Figure 3. Continued on next page 



Hirst et al. eUfe 2014;3:e02866. DOI: 10.7554/eUfe.02866 



8 of 18 



Research article 



Cell biology | Genomics and evolutionary biology 



Figure 3. Continued 

identified. Filled sectors in the Holozoa and Fungi represent F-BAR domain-containing FCHo and Sypi, respectively. Taxon name abbreviations are inset. 
Names in bold indicate taxa with all six components. (B) Deduced evolutionary history of TSET as present in the LECA but independently lost multiple 
times, either partially or completely. See also Figure 3 — source data 1, Figure 3 — figure supplement 1. 

DOI: 10.7554/eLife.02866.017 

The following source data and figure supplements are available for figure 3: 
Source data 1. Sequences used for phylogenetic analyses. 
DOI: 10.7554/eLife.02866.018 

Figure supplement 1. Models used for phylogenetic analyses. 
DOI: 10.7554/eLife.02866.019 



opisthokonts (animals and fungi), indicating secondary loss in the line leading to humans. However, the 
C-terminal domain of TCUP is homologous to the C-terminal domains of the muniscins, opisthokont- 
specific proteins {Gadeyne et a/., 2014) {Figure 4 — figure supplement 10). This suggests that in 
opisthokonts, the TCUP gene retained its 3' end, which then combined with a new 5' end encoding 
an F-BAR domain to generate the muniscin family {Figure 4Q. These include the vertebrate proteins 
FCHol /2 and the yeast protein Sypi , important players in the endocytic pathway {Reider et al., 2009; 
Henne et al., 2010; Cocucci et al., 2012; Umasankar et al., 2012; Mayers et al., 2013). The munis- 
cins constitute one of eight families of MHD proteins in humans, and the only family whose evolu- 
tionary origin was unexplained until now. The present study indicates not only that the muniscins are 
homologous to TCUP, but also that they are the sole surviving remnants of the full TSET complex that 
existed in our pre-opisthokont ancestors. 

Conclusions 

TSET is the latest addition to a growing set of trafficking proteins that have ancient distributions, 
but are frequently lost {Schlacht et al., 2014), or in the case of TSET reduced perhaps with neofunc- 
tionalization {Figure 3). This is consistent with the uneven distribution of the individual components 
(in contrast to the all-or-nothing distribution of AP-5), the additional apparently lineage-specific 
binding partners in Dictyostelium, and the acquisition of extra domains (e.g., F-BAR in opisthokonts 
and SH3 in plants) adding lineage-specific function. 

Studies on the muniscins may help to explain the different phenotypes of TSET knockouts in 
Dictyostelium and Arabidopsis. Like Arabidopsis TSET, the muniscins interact with EH domain- 
containing proteins and participate in clathrin-mediated endocytosis {Reider et al., 2009; Henne 
et al., 2010; Cocucci et al., 2012; Umasankar et al., 2012; Mayers et al., 2013). Dictyostelium 
has lost its TCUP MHD, and it seems likely that concomitant with this loss, it also lost some of 
TSET's binding partners and functions. Nevertheless, we suspect that TSET may predate clathrin- 
mediated endocytosis, for two reasons. First, AP-1 and AP-2, the two AP complexes that function 
together with clathrin, are the most recent additions to the AP family {Figure 4A); and second, 
TSET already has its own P-propeller + a-solenoid scaffold, so it is not clear why it would need 
clathrin as well. Thus, the interaction between TSET and the clathrin pathway may have evolved 
considerably later than TSET itself, although still pre-LECA. It is tempting to speculate that TSET 
was part of the original endocytic machinery, which then became redundant in some organisms as 
the clathrin pathway took over. 

Thus, our bioinformatics tool, reverse HHpred, is able to find novel homologues of known proteins, 
and could potentially be used to identify new players both in membrane traffic and in other pathways 
{Figure 1 — figure supplement S). Using this tool, we were able to find the four core subunits of an 
ancient complex belonging to the same family as the APs and COPI. This ancient complex, TSET, is 
therefore both the answer to the question of the origin of the last set of MHD proteins in humans, 
and a major new piece of the puzzle to be incorporated alongside the other membrane-trafficking 
machinery, as we delve into the history of the eukaryotic cell. 

Materials and methods 

Construction of the 'reverse HHpred' database 

The proteomes of various organisms (detailed in Figure 1 — figure supplement 2) were downloaded 
from the National Center for Biotechnology Information archives at ftp://ftp.ncbi.nih.gov/refseq/release/. 
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Figure 4. Evolution of TSET. (A) Simplified diagram of the concatenated tree for TSET, APs, and COPI, based on 
Figure 4 — figure supplement 8. Numbers indicate posterior probabilities for MrBayes and PhyloBayes and 
maxium-likelihood bootstrap values for PhyML and RAxML, in that order (B) Schematic diagram of TSET. 
(C) Possible evolution of the three families of heterotetramers: TSET, APs, and COPI. We propose that the 
earliest ancestral complex was a likely a heterotrimer or a heterohexamer formed from two identical heterotrim- 
ers, containing large (red), small (yellow), and scaffolding (blue) subunits. All three of these proteins were 
composed of known ancient building blocks of the membrane-trafficking system {Vedovato et al., 2009): 
a-solenoid domains in both the large and scaffolding subunits; two |3-propellers in the scaffolding subunit; and a 
longin domain forming the small subunit. The gene encoding the large subunit then duplicated and mutated to 
generate the two distinct types of large subunits (red and magenta), and the gene encoding the small subunit 
also duplicated and mutated (yellow and orange), with one of the two proteins (orange) acquiring a |J homology 
domain (MHD) to form the ancestral heterotetramer, as proposed by Boehm and Bonifacino (12). However, the 
scaffolding subunit remained a homodimer Upon diversification into three separate families, the scaffolding 
subunit duplicated independently in TSET and COPI, giving rise to TTRAY1 and TTRAY2 in TSET, and to a- and 
P'-COP in COPI. COPI also acquired a new subunit, E-COP (purple). The scaffolding subunit may have been lost 
in the ancestral AP complex, as indicated in the diagram; however, AP-5 is tightly associated with two other 
proteins, SPG1 1 and SPG1 5, and the relationship of SPG 1 1 and SPG 15 to TTRAY/B-COPI remains unresolved, so 
it is possible that SPG1 1 and SPG15 are highly divergent descendants of the original scaffolding subunits. The 
other AP complexes are free heterotetramers when in the cytosol, but membrane-associated AP-1 and AP-2 
interact with another scaffold, clathrin; and AP-3 has also been proposed to interact transiently with a protein 
with similar architecture, Vps41 [Rehling et at., 1999, Cabrera et al., 2010, Asensio et al., 2013). So far no 
scaffold has been proposed for AP-4. Although the order of emergence of TSET and COP relative to adaptins is 
unresolved, our most recent analyses indicate that, contrary to previous reports {Hirst et al., 201 1), AP-5 
diverged basally within the adaptin clade, followed by AP-3, AP-4, and APs 1 and 2, all prior to the LECA. This 
still suggests a primordial bridging of the secretory and phagocytic systems prior to emergence of a trans-Golgi 
Figure 4. Continued on next page 
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Figure 4. Continued 

network. The muniscins arose much later, in ancestral opisthokonts, from a translocation of the TSET MHD- 
encoding sequence to a position immediately downstream from an F-BAR domain-encoding sequence. Another 
translocation occurred in plants, where an SH3 domain-coding sequence was inserted at the 3' end of the 
TSAUCER-coding sequence. See also Figure 4 — figure supplements 1-10. 
DOI: 10.7554/eLife.02866.020 

The following figure supplements are available for figure 4: 

Figure supplement 1. Phylogenetic analysis of TPLATE, P-COP, and P-adaptin, with TPLATE robustly excluded 
from the P-COP clade. 

DOI: 10.7554/eLife.02866.021 

Figure supplement 2. Phylogenetic analysis of TPLATE and P-adaptin subunits (P-COP removed) showing, with 
weak support, that TPLATE is excluded from the adaptin clade. 
DOI: 10.7554/eLife.02866.022 

Figure supplement 3. Phylogenetic analysis of TSAUCER, y-COP, and yaSe^-adaptin subunits, with TCUP robustly 
excluded from the y-COP clade, and weakly excluded from the adaptin clade. 
DOI: 10.7554/eLife.02866.023 

Figure supplement 4. Phylogenetic analysis of TSAUCER and yaSt^-adaptin subunits (y-COP removed), showing 
weak support for the exclusion of TSAUCER from the adaptin clade. 

DOI: 10.7554/eLife.02866,024 

Figure supplement 5. Phylogenetic analysis of TCUP, 6-COP, and |j-adaptin subunits, with TSAUCER robustly 
excluded from the 5-COP clade and weakly excluded from the adaptin clade. 

DOI: 10.7554/eLife.02866.025 

Figure supplement 6. Phylogenetic analysis of TCUP and ^i-adaptin subunits (5-COP removed), showing weak 
support for the exclusion of TCUP from the adaptin clade. 
DOI: 10.7554/eLife.02866.026 

Figure supplement 7. Phylogenetic analysis of TSPOON with ^-COP and a-adaptin subunits with moderate 
support for the exclusion of TSPOON from both the COPI and adaptin clades, in addition to moderate support for 
the monophyly of the TSPOON clade. 
DOI: 10.7554/eLife.02866.027 

Figure supplement 8. TSET is a phylogenetically distinct lineage from F-COPI and the AP complexes. 
DOI: 10.7554/eLife.02866.028 

Figure supplement 9. Phylogenetic analysis of TTRAY1 , TTRAY2, a-COP, and P'-COR 

DOI: 10.7554/eLife.02866.029 

Figure supplement 10. Muniscin family members identified by reverse HHpred, using the following PDB 
structures. 

DOI: 10.7554/eLife.02866.030 



The *. protein. faa.gz files obtained were then split into separate files, each containing one protein 
sequence. These were stored such that each directory contained infornnation from only one species 
(the total number of protein 'faa' files searched for each organism were: Arabidopsis thaliana, 35270; 
Caenorhabditis elegans, 23903; Dictyostelium discoideum, 1 3262; Dictyostelium purpureum, 
12399; Drosophila melanogaster, 22256; Giardia lamblia, 6502; Homo sapiens, 32977; Micromonas 
pusilla, 10269; Mus musculus, 29897; Naegleria gruberi , 15756; Physcomitrella patens, 35893; 
Saccharomyces cerevisiae, 5882; Schizosaccharomyces pombe, 5004; Selaginella moellendorffii, 
31312; Vitis vinifera, 23492; Volvox carteri, 14429). The latest protein data bank (pdb70), which con- 
tains all publicly available 3D structures of proteins, was downloaded from the Gene Center Munich, 
Ludwig-Maximilians-Universitat (LMU) Munich via their web site at: ftp://toolkit.lmb.uni-muenchen.de/ 
pub/HHsearch/databases/hhsearch_dbs/. The linux rpm version 2.0.11 of the hhsuite software was 
downloaded from the same website at ftp://toolkit.lmb.uni-muenchen.de/pub/HH-suite/releases/. 
Each of the faa files was then compared to the pdb70 databank using the hhsearch program from the 
above suite. The files were tested using the default parameters. Once each protein sequence was 
tested, the output file was parsed and the hits were extracted and then inserted into a mysql database. 
The database is searchable by keywords in PDB entries, and therefore is limited to searches where the 
structure of a given domain structure has been solved. The database is accessible using the link http:// 
reversehhpred.cimr.cam.ac.uk, and searches can be initiated using keywords. Should the link become 
unavailable, or if you are interested in hosting this yourself please email jpn25@cam.ac.uk for more 
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infornnation. A conceptually similar database, 'BackPhyre', has independently been generated, using 
Phyre (fCeZ/ey and Sternberg, 2009) rather than HHpred as a starting point to identify homologues of 
known proteins based on predicted structural similarities. Like reverse HHpred, BackPhyre is able to 
find three of the four TSET subunits in Arabidopsis; however, the only eukaryotes represented in 
BackPhyre are A. thaliana, D. melanogaster, H. sapiens, M. musculus, P. falciparum, and S. cerevisiae; 
and without additional organisms, such as D. discoideum and N. gruberi, we would not have been 
able to find the entire TSET complex. 

Data assimilation 

The large adaptor subunits share sequence and structure homology, as do the medium and small 
subunits. Therefore, we were able to combine searches for novel large subunits, or for medium/ 
small subunits. Using the key words 'clathrin', 'adaptor', 'adapter', 'adaptin', 'API', 'AP2', 'AP3', 
'AP4', we searched in PDB for solved structures of any large or medium/small subunit in a given 
organism (11 solved structures for the large subunits and six solved structures for the medium/ 
small subunits were used to initiate searches [Figure 1 — figure supplement 1]). These structures 
span different domains found within the subunits. For each search, a list was output of any pro- 
teins found to contain structural homology. Included in this information are the precise amino 
acids encompassing the region of similarity, the probability score, and most importantly the 'result 
number'. A protein with a 'result number' of '1 ' means that there was no other structure in the PDB 
database that it is more like. Since multiple structures for the various subunits were used, we could 
also factor in the number of times a particular protein was identified in a search ('repeats'). These 
parameters were used as key pieces of evidence to determine how likely a hit in these searches 
would be. Once the primary data were outputted, all other manipulations were performed in 
Excel. For the large subunits there were 1 1 data sets (the 1 1 structures used to search for homo- 
logues), and for the medium/small subunits there were six data sets. The data manipulation was 
standardised at this point, and the following steps performed to assimilate the data. The data sets 
were sorted by result number to preclude anything with a result number of >50 (this means that 
there are 49 other structures in the PDB database that this protein is more similar to). Duplicates, 
where a protein was identified in multiple searches, were removed with the highest ranking (in 
'result' terms) kept, and the number of times it was identified recorded in a new column ('repeats'). 
The results were the ordered with the lowest 'Result number' and the highest 'Probability' to give 
a final list of proteins {Figure 1 — source data 1, 2). Generally only proteins with a 'Result number' 
<10, 'Probability' >50%, at least 100 amino acids of homology ('thstt' to 'thend'), and 'Repeats' at 
least two times were considered to be real hits. For ease of visualisation, only proteins with Result 
number <10 or 'Repeats' >2 are shown, and other proteins of interest (e.g., FCHol, Sypi) with 
Result number <10 that did not fit the criteria listed above are greyed out. The 'IDs' have been 
deduced using NCBI BLAST searches, and have not been experimentally verified. Where the identity 
is ambiguous (such as the identity of a P-adaptin), a shared homology is suggested. 

Dictyostelium-. the search for TSPOON and TCUP 

While searching for genes encoding potential components of the complex in four dictyostelid 
genomes, we could find complete sets in Polyspliondylium pallidum and Dictyostelium fasciculatum, 
but one component each was missing in the databases of predicted proteins of D. discoideum (o-like 
subunit) and D. purpureum (jj-like subunit). We identified these genes by tbiastn {Camacho et al., 
2009), using the most closely related orthologous sequence as query and the chromosomal sequences 
as target. Gene models were created and refined using the Artemis tool [Carver et al., 2012). These 
two genes have been given the DictyBase IDs DDB_G0350235 (D. discoideum TSPOON) and 
DPU0040472 (D. purpureum TCUP) (www.dictybase.org). 

Dictyostelium expression constructs 

The a-like (TSPOON) coding sequence (CDS) was synthesised (GeneCust) with a Bglll restriction 
site inserted at its 5' end, its stop codon removed, and a Spel site inserted at its 3' end, then 
cloned into pBluescript KSII and sequenced. The CDS was then transferred into a derivative of 
pDMIOOS {Veltman et al., 2009) as a Bglll/Spel fragment, placing GFP at the C terminus, with 
expression driven from the constitutive actinIS promoter, to generate plasmid pJHIOI. In addi- 
tion, the TSPOON promoter and the first 105 bases of the CDS were amplified from Ax2 
gemonic DMA by PCR, using primers (5'TATCTCGAGCGTCTTCATCTTCACTATCATTTAATG-3') and 
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(5'-TAAAAGCTnTCATATTCACTCTGTTTCTCGTC-3'). The product was cut with Xhol/Hindlll, and 
the 536-bp fragment cloned into the pBluescript KSII plasmid already containing the TSPOON CDS, 
via the Xhol site in the vector and the silent Hindlll site introduced at nucleotide +97 of the TSPOON 
CDS during its synthesis. The resulting promoter-driven TSPOON CDS was removed by digestion with 
Xhol/Spel and inserted into the corresponding sites of pDM323 and pDM450, resulting in expression 
constructs containing the TSPOON CDS with GFP fused at its C terminus and driven by its own pro- 
moter (pDT61 and pDT58 respectively). 

Dictyostelium cell culture and transformation 

D. discoideum Ax2-derived strains were grown and maintained in HL5 medium (Formedium) contain- 
ing 200 |jg/ml dihydrostreptomycin on tissue culture treated plastic dishes, or shaken at 180 rpm, 
at 22°C (Kay, 1987). Cells were transformed with expression constructs (30 pg/4 x 10' cells) by electro- 
poration using previously described methods {Knecht and Pang, 1995). Transformants were selected 
and maintained in axenic medium supplemented with 60 |jg/ml hygromycin (pDT58 and pJHIOl) and 
20 Mg/ml G418 (pDT61 and Actin15_GTP; Traynor and Kay, 2007). For the TSPOON knockout, 
17.5 |jg of the blasticidin disruption cassette, freed from pDT70 by digestion with Apal and Sacll, was 
added to 4 x 10' Ax2 cells before electroporation. Transformants were selected and maintained in 
HL5 medium containing 10 |jg/ml blasicidin. 

Dictyostelium microscopy and fractionation 

Cells were transformed with GFP driven by the actin 15 promoter (A15_GFP; Traynor and Kay, 
2007), or with TSPOON-GFP driven by either the actin 15 promoter (A15_TSPOON -GFP) or its 
own promoter (promoter_TSPOON-GFP). For microscopy, the cells were washed in KK2 (16.5 mM 
KH2P04, 3.8 mM K2HP04, 2 mM MgS04) at 2 x lOVml and then transferred into glass bottom 
dishes (MatTek, Ashland, MA) at 1 x lOVcm^. They were either imaged immediately (vegetative) 
or allowed to starve for a further 6-8 hr (developed) before imaging live on a Zeiss Axiovert 200 
inverted microscope (Carl Zeiss, Jena, Germany) using a Zeiss Plan Achromat 63 x oil immersion 
objective (numerical aperture 1.4), an OCRA-ER2 camera (Hamamatsu, Hamamatsu, Japan), and 
Improvision Openlab software (PerkinElmer, Waltham, MA). Various treatments including with or 
without starvation, fixation, pre-fixation saponin treatment did not reveal obvious membrane- 
associated labelling in cells expressing either promoter_TSPOON-GFP and A15_TSPOON expressing 
cells. 

For TIRF microscopy, TSPOON-GFP was expressed in the TSPOON null cell lines HM1725 and 
HM1 727 (see below), using the promoter_TSPOON-GFP plasmids pDT58 or pDT61 . Transformants 
were selected and maintained in 30 |jg/ml hygromycin (pDT58) or 10 |jg/ml G418 (pDT61). As a con- 
trol, free GFP was expressed in the null cells using the plasmid A15_GFP. Cells were harvested from 
tissue culture dishes when they formed a semi-confluent monoloyer and washed in KK2C (KK2 contain- 
ing 0.1 mM CaCl2). Approximately 3 x 10" cells were added to 35-mm glass bottom (No. 1.5 cover- 
glass) microwell dishes (MatTek) containing 2.5 ml of KK2C. They were incubated at 22°C for 2 hr to 
allow residual fluorescence associated with ingested axenic medium to dissipate, and 20 min before 
imaging, the KK2C was with fresh KK2C, containing 50 |jg/ml L-ascorbic acid as an antioxidant to 
reduce the effects of phototoxicity. Cells were visualised using a Nikon N-STORM microscope oper- 
ating in the TIRF mode with a lOOx lens (NA 1.49) and a zoom of 1.5x. 

For fractionation, cells expressing A15_GFP or promoter_TSPOON-GFP were grown until they 
reached a density of 2-4 x 1 0Vml in selective media, and by microscopy >50% of cells were expressing 
GFP. Starting with a maximum of 8 x 10^ cells, the cells were washed in KK2 buffer and then pelleted 
at 600 X g for 3 min. The cells were resuspended in PBS with a protease inhibitor cocktail (Roche), 
lysed by 8 strokes of a motorized Potter-Elvehjem homogenizer followed by 5 strokes through a 21 -g 
needle, and centrifuged at 41 00 x g for 32 min to get rid of nuclei and unbroken cells. The postnuclear 
supernatant was then centrifuged at 50,000 rpm (135,700 x g RCFmax) for 30 min in a TLA-1 10 rotor 
(Beckman Coulter) to recover the membrane pellet. The cytosolic supernatant and pellet were run 
on pre-cast NUPAGE 4-12% BisTris Gels (Novex) at equal protein loadings, and Western blots were 
probed with an antibody against GFP (Seaman et a/., 2009). 

Dictyostelium pulldowns and proteomics 

Pulldowns were performed using Dictyostelium discoideum stably expressing TSPOON-GFP under a 
constitutive (A15_ TSPOON-GFP) and its own promoter (prom_TSPOON-GFP). Similar results were 
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found with both cell lines regardless of the promoter. Non-transformed cells were used as a control. 
Cells were grown until they reached a density of 2-4 x lOVml in selective media, and by micros- 
copy >50% of cells were expressing GFP. Starting with a maximum of 8 x 1 0^ cells, they were pelleted 
by centrifugation (600xg for 2 min) and washed twice in KK2 buffer before being resuspended at 
2x10' cells/ml in KK2 buffer and starved for 4-6 hr at 22°C by shaking at 180 rpm. The cells were 
then pelleted at 600xg for 3 min and then lysed in 4 ml PBS 1 % TX1 00 plus protease inhibitor cocktail 
tablet (Roche) for 10 min on ice, and then spun 20,000xg 15 min to get rid of debris and insoluble 
material. By protein assay the resulting lysate contained 10-15 mg total protein. The lysates were pre- 
cleared using PA-sepharose 30 min, and then immunoprecipitated using anti-GFP overnight with rota- 
tion at 4°C. PA-sepharose was added for 60 min and then the antibody complexes washed with PBS 
1%TX100 followed by PBS before elution from beads with 100 mM Tris, 2% SDS 60''C for 10 min. The 
eluted proteins were precipitated with acetone overnight at -20°C, recovered by spinning 15,000xg 
5 min and then resuspending in sample buffer. The samples were run on pre-cast NUPAGE 4-12% 
BisTris Gels (Novex), stained with SimplyBlue Safe Stain (Invitrogen) and then cut into 8 gel slices. Each 
gel slice was processed by filter-aided sample preparation solution digest, and the sample was 
analyzed by liquid chromatography-tandem mass spectrometry in an Orbitrap mass spectrometer 
(Thermo Scientific; Waltham, MA) {Antrobus and Borner, 2011). 

Proteins that came down in the non-transformed control were eliminated, as were any proteins 
with less than 5 identified peptides, proteins that did not consistently coimmunoprecipitate in three 
independent experiments, or proteins of very low abundance compared with the bait (i.e., molar ratios 
of <0.002). The remaining ten proteins were considered to be specifically immunoprecipitated. 
Normalized peptide intensities were used to estimate the relative abundance of the specific inter- 
actors (iBAQ method; Schwanhausser et a/., 2011). For each protein, the values from all five 
repeats were plotted, including the bait protein and GFP which are clearly overrepresented by over- 
expression. The relative abundances of proteins were normalized to the median abundance of all 
proteins across each experiment (i.e., median set to 1.0) and values were then log-transformed and 
plotted. 

Dictyostelium gene disruption 

The TSPOON disruption plasmid was constructed by inserting regions amplified by PGR from 
upstream and downstream of the TSPOON gene into both side of the blasticidin-resistance 
cassette in pLPBLP (Fa/x et al., 2004). The primer pair used to amplify the 5' region was 
TCP1 (5'-ACTGGGCCCTGATGTTTACCTCTCTTTGGGTCATCCCATTCTATAC-3') with a-TCP2 
(5'-AAAAAGCTTTATTACCATTGTTATTGGTAATTAACAAACTATTGATC-3') and for the 3' homology 
TCP3 (5'-A CCGCGGCCGCATAATTCAAAGAGGTCATTTAGATCAAGTTCAATTAG-3') with TCP4 
(5'-CCTCCGCGGCTTCAGGCATTGGTTCAACTTCTTGATTATTCTCAAC -3'). The PGR products were 
inserted as Apal/Hindlll and Notl/Sacll fragments into the corresponding sites in pLPBLP, yielding 
pDT70. 

Growth of control vs mutant strains was assayed in HL5 medium, by calculating the mean genera- 
tion time, and on Klebsiella aerogenes bacterial lawns, by monitoring the expansion of a spot of 10* 
cells. Spore viability was also assayed, both with and without detergent treatment, by clonally diluting 
spores on bacterial lawns and counting the resultant plaques {Kay, 1982). 

Endocytosis assays 

Membrane uptake was measured in real time at 22°C with 2x10' cells in 1 ml of KK2C containing 
10 |jM FM1-43 (Life Technologies). Briefly, a 2-ml fluorimeter cuvette containing 0.9 ml of KK2C 
plus 11 |jM FMI-43 was placed in the fluorimeter (PerkinElmer LS50B) with stirring set on high. 
The uptake was initiated by the addition of 100 |jl cells at 2 x lO'/ml in KK2C and data collected 
every 1 .2 s at an excitation of 470 nm (slit width 5 nm) and emission of 570 nm (slit width 10 nm) 
for up to 360 s. The uptake curves were biphasic and the data were normalized against the initial 
rise in fluorescence, when the cells were first added to the FMI-43, as this essentially corresponds 
to the dye incorporation into the plasma membrane only {Aguado-Velasco and Bretscher, 1999). 
The uptake rate was calculated from linear regression of the initial linear phase of the uptake using 
GraphPad Prism software. The surface area uptake time is 1/slope of the initial phase. 

Fluid phase uptake was measured at 22°C using FITC-dextran 70 kDa (Sigma FD-70) by adding 
2 mg/ml (final) to cells (1 x lOVml) in filtered HL5 medium that was shaken at 180 rpm. Duplicate 
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0.5 ml samples were taken at each time point and diluted in 1 ml of ice-cold HL5 in a microcentrifuge 
tube held on iced water. Cells were pelleted, the supernatant aspirated, and the pellet washed twice 
by centrifugation in 1.5 ml ice-cold wash buffer (KK2C plus 0.5%wt/vol BSA) before being lysed in 1 ml 
of buffer [1 00 mM Tris-HCI, 0.2% (vol/vol) Triton X-1 00, pH 8.6] and fluorescence then determined 
(excitation 490 nm, slit width 2.5 nm; emission 520 nm, slit width 10 nm). Data were normalized to 
protein content {Traynor and Kay, 2007). 

Comparative genomics 

Sequences from Arabidopsis thaliana, Dictyostelium discoideum, and Naegleria gruberi were obtained 
with our new reverse HHpred tool. These sequences were used to build HMMs for each subunit using 
HMMer v3.1 b1 (http://hmmer.org). HMMs were used to search the protein databases for the organ- 
isms in Figure 3A (see Figure 3 — source data 1 for the location of each genomic database). Sequences 
identified as potential homologues were verified through reciprocal BLAST into the genomes of each 
of the original three sequences. Sequences were considered homologues if they retrieved the correct 
orthologue as the reciprocal best hit in at least one of the reference genomes, with an e-value at least 
two orders of magnitude better than the next best hit. New sequences were incorporated into the 
HMM prior to searching a new genome in order to increase the sensitivity and specificity of the HMM. 
Genomic protein databases were also searched by BLAST using the closest related organism with an 
identified sequence as the reference genome. Nucleotide databases (scaffolds or contigs) were also 
searched using tbiastn to ensure that no sequences were missed resulting from incomplete protein 
databases. The distribution of TSET components is displayed in Coulson plot format using the Coulson 
plot generator v1 .5 {Field et al., 2013). 

Phylogenetic analysis 

Identified sequences were combined with the adaptin and COPI sequences from Hirst et al. (2011) 
into subunit-specific data sets with the intention of concatenation. Data sets were aligned using 
MUSCLE v3.6 {Edgar, 2004) and masked and trimmed using Mesquite v2.75. Phylogenetic analysis 
was carried out using MrBayes v.3.2.2 (Ronquist and Huelsenbeck, 2003) and RAxML v7.6.3 
{Stamatakis, 2006), hosted on the CIPRES web portal {Miller et al., 2010). MrBayes was run using a 
mixed model with the gamma parameter until convergence (splits frequencey of 0.1). RAxML was run 
under the LG + F + CAT model {Lartillot et al., 2009) and bootstrapped with 100 pseudoreplicates. 
The resulting trees were visualized using FigTree v1.4. Initial data sets were run and long branches 
were removed. Data sets were then re-aligned and re-run as above. Opisthokont adaptin and COPI 
sequences were also removed from all data sets except from the TCUP alignment. Data sets were 
realigned and new phylogenetic analyses were carried out. Remaining sequences were used for con- 
catenation. Sequences were aligned and trimmed, as above, and concatenated using Geneious v7.0.6. 
Subsequent phylogenetic analysis was carried using PhyloBayes v3.3 {Lartillot et al., 2009) under the 
LG + CAT model until a splits frequency of 0.1 and 100 sampling points was achieved, and PhyML v3.0, 
with model testing carried out using ProtTest v3.3. MrBayes and RAxML were used as above. Raw 
phylogenetic trees were converted into figures using Adobe Illustrator CS4. The models of amino acid 
sequence evolution are provided in Figure 3 — figure supplement 1. The database identifiers of all 
sequences and their abbreviations and figure annotations are provided in Figure 3 — source data 1. 
All alignments are available in Supplementary file 1. 

Homology modeling 

The Phyre v2.0 web server {Kelley and Sternberg, 2009) was used to predict the 3D structures of 
each TTRAY from A. thaliana, D. discoideum, and N. gruberi. Default settings were used for structural 
predictions, and structures were visualized using MacPyMOL (www.pymol.org). 
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